Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix file cache concurrency #420

Merged
merged 11 commits into from
Dec 18, 2021
Merged

Fix file cache concurrency #420

merged 11 commits into from
Dec 18, 2021

Conversation

Hellysonrp
Copy link
Contributor

Fixes #419

I was going to open the PR after my tests, but I decided to open it now to get feedback about the code.

Changes

Use p-queue to avoid the execution of evictOldestIfNecessary concurrently.
I also removed the recursion from the function.
I don't like recursion and prefer to avoid it, only using it if it's really necessary. I always think of the possibility of a stack overflow.

I removed the await when calling evictOldestIfNecessary.
I want some feedback about this change. See the commit message for more info if needed.

Hellysonrp and others added 7 commits December 3, 2021 00:30
This commit also removes the `await` from every stream creation. The eviction will be handled totally assyncronously. The only drawback is the possibility of  exceeding the cache limit for a moment, until the next execution of `evictOldest`.
This will only be a problem if the cache is set too close to the remaining disk space, which I wouldn't recomend.

I also removed the recursion.
(Strict mode in TS 4.4 enables useUnknownInCatchVariables, so this is redundant.)
@codetheweb
Copy link
Collaborator

I made a number of changes, let me know what you think.

  • I think we need to wait for the eviction process to finish in .cleanup(), so I changed it back to an async function that waits for the queue to be empty. Less sure about needing to wait inside of .on('close', ...)--if you have strong feelings about that we can change it back to what you had.
  • What was the reasoning for originally making the queue a static property?
  • I removed the empty queue checks inside of .evictOldestIfNecessary() as running .evictOldest() isn't expensive.
  • Removed all recursion and just shoved it inside a loop.

Logging now looks like:

[0] 2021-12-03T16:06:10.363Z muse Evicting oldest files...
[0] 2021-12-03T16:06:10.366Z muse 45b92f0c6a33e3d23043ccbca8fd1b5895924b3d06327f2042cb341b0eea88100539d893776e1d22e16a2f4659f9074af760bdeacaf6d6216d360636fda7408e has been evicted
[0] 2021-12-03T16:06:10.367Z muse 1 files have been evicted

@@ -58,10 +61,14 @@ export default class FileCacheProvider {
const stats = await fs.stat(tmpPath);

if (stats.size !== 0) {
await fs.rename(tmpPath, finalPath);
}
try {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, do we need to be catching and silently continuing here? I don't think this would throw because of race conditions anymore.

@Hellysonrp
Copy link
Contributor Author

What was the reasoning for originally making the queue a static property?

Each guild has a player.
Each player has a file cache.
If the queue isn't static, each file cache will have its own queue. Consequently, each guild will have its own queue.

The error might occur if two or more guilds are using the bot at the same time (yes, I had this problem when I was testing with the mutex).

@Hellysonrp
Copy link
Contributor Author

if you have strong feelings about that we can change it back to what you had

I have a tendency of overthinking sometimes. What I thought was about a possible delay before returning the write stream, but as you said, evictOldest isn't expensive, so I was just overthinking again 😅

@codetheweb
Copy link
Collaborator

Ah, should have been thinking about multiple guilds. Makes sense.

@Hellysonrp
Copy link
Contributor Author

I'll update my bot and test it today.

@Hellysonrp
Copy link
Contributor Author

Today the bot died with the same error.
It doesn't appear to be related to concurrently trying to delete a file though.
Maybe between all the errors that happened before, some files exists in the database but not in the folder?

Screenshot_27

I don't think this will happen in clean installs, but it might be worth to expect this kind of error (file in the database but not in the folder) to avoid crashes.

@codetheweb
Copy link
Collaborator

codetheweb commented Dec 7, 2021

You're probably right. .removeOrphans() currently treats the database as the source of truth, but should probably use the filesystem as the source of truth instead.

Would it make sense to update .removeOrphans() so it loops over table rows rather than files? Or should the existing code stay there and .removeOrphans() checks both directions?

@Hellysonrp
Copy link
Contributor Author

I think it should check both directions, but I don't like the idea of checking some files twice. It might impact the startup time of bots with big caches.
Maybe we could do something to mark the file as 'already checked' and avoid checking it twice. One example would be something like:

  • Add a column to the file cache table, that defaults to 0 and can be 1
  • Set this column to 0 for all records in the beginning of .removeOrphans()
  • Keep the current code, but update the database to set the column to 1 if the file is found
  • Delete all other database records after finishing the current code

This is only an example, I don't know if it would be faster than looping through every file and looping through every database record, because of the update to the database.

I might be overthinking again lol 😄

@codetheweb
Copy link
Collaborator

Just pushed an update that checks both directions, let me know how that works.

I think adding a column is probably overkill and could lead to race conditions if you have multiple instances of Muse sharing the same database for some reason.

@Hellysonrp
Copy link
Contributor Author

I think this can be merged.
My bot has been running for days without problems.

@codetheweb
Copy link
Collaborator

Looks like I'll have to re-evaluate my approach to the GitHub Workflows I just added since it seems Secrets aren't accessible when running from a forked repo.

@codetheweb codetheweb merged commit d8fc7d3 into museofficial:master Dec 18, 2021
@codetheweb
Copy link
Collaborator

Released in v0.1.1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bot dies trying to delete the same cache file more than one time
2 participants